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PREFACE 


The  Human  Predictive  Reasoning  for  Group  Interactions  research  effort  was  sponsored 
by  the  Air  Force  Research  Laboratory’s  (AFRL),  Sensemaking  and  Organizational  Effectiveness 
Branch  (711  HPW/RHXS)  under  Task  Order  #14  of  the  Technology  for  Agile  Combat  Support 
(TAGS)  contract  (FA8650-D-6546).  The  period  of  performance  for  the  research  effort  extended 
from  10  January  2008  to  new  contract  date  9  August  2010.  This  report  documents  the  results  of 
research  activities  conducted  as  part  of  this  task  order. 
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1.0  SUMMARY 


The  ultimate  goal  of  this  project  was  to  develop  a  model  for  instantiated  into  the  National 
Operational  Environment  Model  (NOEM)  that  permits  the  capture  of  changes  in  group  attitudes 
and  behavior  as  functions  of  changing  environmental  variables.  The  ‘behavior  module’  in 
NOEM  is  expected  to  publish  affinity  measures  between  group  entities  over  time.  The  word 
‘affinity’  here  implies  a  ‘natural  liking  to’  or  ‘attraction  to’,  so  that  if  a  certain  in-group  has  a 
high  level  of  affinity  towards  a  given  out-group,  the  in-group  has  a  natural  attraction  towards  that 
out-group.  For  example,  a  population  might  transition  from  an  “unsupportive”  state  to  a 
“supportive”  state,  say,  towards  the  indigenous  government,  in  a  relatively  short  period  of  time  if 
it  perceives  that  its  quality  of  life  will  be  improved  by,  say,  the  election  of  a  new  political  figure. 
The  model  developed  in  this  effort  effectively  captures  the  dynamic  nature  of  collective  and/or 
individual  behavior  as  a  function  of  changing  variables  in  the  operating  environment  (OE).  In 
addition  to  instantiation  into  the  NOEM,  the  methods  and  procedures  outlined  in  this  report  can 
be  used  as  stand-alone  methods  for  conducting  causal  or  correlation  studies  (the  former 
implying  a  properly  designed  experiment  is  used  in  the  process  of  data  collection)  involving 
groups.  For  example,  one  might  be  interested  in  detennining  if  group-level  and/or  individual- 
level  characteristics  (i.e.,  factors)  provide  any  predictive  power  of  some  collective  or  individual- 
level  outcome. 

There  are  two  primary  contributions  of  this  effort.  The  first  is  the  development  of  a  novel 
statistical  methodology  for  analyzing  group  and/or  individual-level  constructs,  where  the  goal  is 
to  estimate  the  functional  relationships  between  the  constructs  and  one  or  more  characteristics  of 
groups  or  individuals,  including  attitudes  towards  other  groups.  The  model  is  especially  useful 
in  situations  where  observational  studies  or  designed  experiments  involving  groups  and  their 
members  are  conducted,  say  using  survey  response  data  (e.g.,  Likert-items).  The  proposed 
analysis  method  employs  a  random  effects  model  in  conjunction  with  a  data-based  approach  for 
detennining  an  appropriate  transformation  on  the  ‘response’  variable  in  order  to  correct  for 
violations  of  the  underlying  model  assumptions.  We  develop  an  algorithm  for  use  in  parameter 
estimation  using  the  method  of  maximum  likelihood,  and  suggest  an  approximate  test  for 
detennining  the  statistical  significance  of  the  independent  variables  considered.  The  proposed 
model  takes  into  account  the  non-independence  between  responses  obtained  from  members 
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belonging  to  the  same  group.  That  is,  members  of  the  same  group  are  assumed  to  share  common 
goals,  values,  culture,  beliefs,  etc.,  and  as  such,  one  might  expect  the  responses  obtained  from 
members  randomly  selected  from  the  same  group  to  be  more  similar  than  responses  obtained 
from  two  individuals  that  were  randomly  selected  from  the  entire  population  of  individuals.  If 
this  non-independence  is  neglected  by  the  use  of,  say,  standard  analysis  of  variance  (ANOVA)  or 
regression  procedures  (which,  unfortunately,  is  quite  common  in  practice),  the  results  of  the 
analysis  can  be  misleading.  Specifically,  collective  or  group-level  independent  variables  may 
appear  to  be  statistically  significant  when  they  are  not,  and  individual-level  independent 
variables  that  are  in  fact  significant  can  go  undetected.  Once  fitted  models  are  obtained  using  the 
proposed  approach,  one  can  then  instantiate  these  into  the  NOEM  as  meta-models. 

The  second  contribution  of  this  effort,  the  Markov  Affinity  Model  with  Bayesian  Updates 
(MAMBU),  involves  an  alternative  approach  to  modeling  group  constructs  (i.e.,  affinity,  trust, 
etc.)  as  a  function  of  the  changing  state  of  the  environment.  The  model  is  especially  useful  when 
historical  data  (or  subject  matter  expert  knowledge  when  empirical  data  is  lacking)  is  available 
on  the  conditional  distribution  of  “important”  environmental  variables,  given  the  behavioral  state 
of  the  groups  under  study.  The  tenn  “important”  implies  that  only  those  variables  that  are 
assumed  to  affect  group  attitudes/behavior  are  considered,  perhaps  detennined  a  priori  via  an 
appropriate  statistical  analysis.  The  intent  of  MAMBU  is  primarily  for  instantiation  into  the 
NOEM.  The  approach  taken  by  MAMBU  models  the  probability  distribution  assigned  to  a 
group’s  behavioral  state  space  over  time  using  a  discrete  time  Markov  chain,  and  updates  this 
probability  distribution  whenever  new  information  becomes  available  using  a  Bayesian  updating 
procedure.  The  purpose  of  the  Markov  chain  is  to  serve  as  a  prior  probability  distribution  over 
the  behavioral  state  space;  that  is,  prior  to  obtaining  any  new  information.  Once  new 
infonnation  becomes  available  (i.e.,  current  environmental  conditions  are  observed),  the 
posterior  probability  distribution  assigned  to  the  group’s  behavioral  state  space  is  computed. 

The  posterior  distribution  represents  the  probability  distribution  assigned  to  the  behavioral  state 
space,  however,  conditional  on  current  environmental  conditions.  The  affinity  scores  between 
groups  are  computed  at  any  time  t,  and  are  taken  to  be  an  expectation  across  the  posterior 
probability  distribution  assigned  to  the  behavioral  state  space  at  time  t.  In  addition,  to  account 
for  the  fact  that  individuals  (and  hence,  groups)  retain  memory  of  past  events,  we  geometrically 
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weight  the  posterior  probability  distributions  (and  thus,  the  affinities)  over  time.  We  provide 
several  examples  demonstrating  how  MAMBU  is  applied  and  discuss  alternative  approaches  to 
populating  MAMBU  using  empirical  data  and  subject  matter  expert  knowledge. 
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2.0  INTRODUCTION 


Following  that  of  (Nezlek  &  Zyzniewski,  1998),  we  define  groups  as  collections  of 
individuals  that  occur  either  naturally,  such  as  work  groups  in  organizations,  or  arbitrarily,  such 
as  groups  created  in  experiments.  Further,  group-level  phenomena  are  defined  as  variables  or 
outcomes  that  exist  only  at  a  group  or  aggregate  level  (e.g.,  unemployment  rate,  violent  death 
rate,  crime  rate).  In  contrast,  individual-level  phenomena  are  defined  as  variables  or  outcomes 
that  exist  at  the  individual  level  (e.g.,  age,  sex,  education). 

Prior  to  the  development  a  behavior  model  for  NOEM,  one  must  first  have  a  means  to 
understanding  how  group-level  (or  collective  individual)  responses  are  functionally  related  to 
relevant  variables,  where  the  relevant  variables  can  be  cast  across  both  group  and  individual 
levels.  This  can  be  accomplished  via  appropriate  statistical  analyses  of  observational  and/or 
experimental  data  that  might  be  available  to  the  analyst.  In  empirical  studies  of  groups,  response 
data  is  sampled  from  individual  group  members,  and  thus,  it  is  hierarchical  in  nature,  as  pointed 
out  by  (Nezlek  &  Zyzniewski,  1998)  and  others  (e.g.,  see  (Anderson  &  Ager,  1978),  (Draper, 
1995),  (Faris  &  Brown,  2003),  (Forsyth,  1998),  (Hoyle  &  Crawford,  1994),  (Hoyle,  Georgesen, 
&  Webster,  2001),  (Kenny  &  Voie,  1985),  (Quillian,  1995),  (Raudenbush,  1995),  (Raudenbush 
&  Willms,  1995)).  For  example,  one  might  be  interested  in  determining  the  effect  of  a  host 
population’s  unemployment  rate  and  skill  level  of  individuals  on  the  individual  or  collective 
attitudes  and  behaviors  towards  immigrants.  Note  that  skill  level  is  an  individual-level  factor  and 
unemployment  rate  is  an  aggregate  or  group-level  factor.  Since  we  are  dealing  with  groups, 
individuals  are  then,  by  definition,  nested  units  of  observation;  that  is,  nested  within  groups. 
Nested  data  structures,  such  as  that  considered  in  this  effort,  present  several  problems  from  an 
analysis  point  of  view. 

The  problem  studied  in  this  effort  is  one  involving  human  group  processes.  In  particular, 
interest  lies  in  drawing  statistical  inference  on  variables  that  influence  human  group  processes, 
whether  at  the  individual  or  group  level.  In  the  past,  this  task  has  been  accomplished  primarily 
via  standard  ANOVA  and/or  multiple  linear  regression  methods,  such  as  those  discussed  in 
(DeMaris,  2004).  However,  people  that  exist  within  groups  (e.g.,  students  in  schools,  members 
of  churches,  residents  of  villages,  members  of  organizations,  etc.)  tend  to  be  more  homogeneous 
in  their  beliefs,  goals,  culture,  etc.,  than  people,  say,  randomly  selected  from  the  entire 
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population  of  humans.  This  suggests  that  group  membership  induces  positive  correlation  (i.e., 
non-independence)  between  responses  obtained  from  any  two  people  belonging  to  the  same 
group.  Unfortunately,  this  complicates  the  problem  from  a  statistical  analysis  point  of  view,  and 
often  renders  results  obtained  from  standard  statistical  analyses  questionable. 

The  problem  of  non-independent  observations  has  been  addressed  by  several  authors  in 
the  social  science  literature,  e.g.,  see  (Hoyle,  Georgesen,  &  Webster,  2001),  and  a  variety  of 
solutions  have  been  proposed.  One  class  of  solutions  attempts  to  circumvent  the  statistical 
problem  altogether.  Some  of  these  strategies  are  discussed  in  (Hoyle  &  Crawford,  1994). 
Although  these  strategies  can  serve  to  eliminate  the  problem  of  non-independence  of 
observations,  their  use  is  limited  primarily  to  laboratory  research  and  research  questions  that 
focus  on  individual  group  members  as  opposed  to  group  process  and  behavior.  Since  our  interest 
lies  in  drawing  inference  on  collective  behavior,  we  do  not  view  this  class  of  solutions  as  viable. 

Another  class  of  solutions  is  statistical  in  nature.  A  common  strategy  in  this  class 
involves  the  ANOVA  design  discussed  in  (Anderson  &  Ager,  1978).  This  method  essentially 
corrects  the  variance  estimates  needed  for  performing  correct  statistical  tests;  however,  at  the 
cost  of  statistical  power,  i.e.,  some  factors  can  go  undetected.  A  major  pitfall  of  this  method  is 
that  it  does  not  generally  allow  for  drawing  inference  at  both  the  group  and  individual  levels. 

For  studies  in  which  group-level  effects  are  present,  researchers  are  advised  to  study  only  the 
group-level,  to  the  exclusion  of  the  individual.  Another  approach  along  these  lines  is 
(Schiffenbauer,  Schulman,  &  Poe,  1978).  Since  a  defining  goal  in  sociology  is  the  study  of  both 
the  group  and  individual,  these  models  are  rather  limited  in  their  usefulness  within  the  field  of 
sociology. 

In  another  approach,  (Kenny  &  Voie,  1985)  proposed  a  statistical  technique  that  includes 
both  individual  and  group-level  effects.  These  authors  treat  the  simultaneous  study  of 
individuals  and  groups  as  an  exercise  in  construct  validity,  as  defined  by  (Cronbach  &  Meehl, 
1955).  Their  model  provides  for  the  estimation  of  individual-  and  group-level  correlations, 
where  if  individual  level  correlations  are  found  to  exist,  a  hierarchical  ANOVA  strategy  as 
recommended  by  (Myers,  1972)  is  used.  On  the  other  hand,  if  these  correlations  are  found  to  be 
null,  then  standard  ANOVA  methods  are  used  at  the  individual-level.  Although  the  model 
proposed  by  (Kenny  &  Voie,  1985)  considers  the  non-independence  of  individuals  belonging  to 
the  same  group,  their  method  of  analysis  relies  heavily  on  the  fact  that  individuals  are  randomly 
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assigned  to  groups.  This  is  certainly  a  plausible  assumption  in  experimental  and  laboratory 
situations  where  randomization  poses  no  problem.  However,  in  cases  for  which  natural  groups 
are  studied,  randomization  in  this  manner  usually  does  not  occur.  Another  limitation  is  that  their 
method  does  not  allow  for  the  estimation  and  test  of  interactions  between  group-  and  individual- 
level  independent  variables.  This  is  a  significant  limitation  as  many  times  these  interactions  are 
highly  significant  predictors.  Additional  limitations  of  this  model  are  pointed  out  in  (Moritz  & 
Watson,  1998). 

A  more  sophisticated  strategy  involves  hierarchical  linear  modeling  (Byrk  & 

Raudenbush,  1992),  which  permits  correct  variance  estimates  for  inference  purposes  and  allows 
for  simultaneous  hypothesis  testing  at  both  the  group  and  individual  levels.  This  is  the  approach 
we  take  in  this  effort.  In  particular,  we  propose  a  hierarchical  linear  model  for  analyzing  group 
and/or  individual-level  constructs  as  functions  of  group  and/or  individual-level  factors.  The 
model  considers  the  possibility  of  group-level  main  effects  and  interactions,  individual-level 
main  effects  and  interactions,  as  well  as  interaction  effects  involving  group-level  variables  and 
individual-level  variables.  Further,  it  is  not  required  that  individuals  be  randomly  assigned  to 
groups.  In  fact,  the  proposed  method  is  quite  general,  and  can  be  used  with  experimental  data,  as 
well  as  observational  data.  We  derive  maximum  likelihood  estimates  for  the  unknown 
parameters  of  the  model  and  propose  a  method  for  testing  the  significance  of  the  factor  effects  at 
all  levels  of  analysis.  The  proposed  statistical  method  has  generality  well  outside  the  scope  of 
NOEM,  to  include  the  statistical  characterization  and  prediction  of  group  attitudes  and  behavior 
as  a  function  of  relevant  factors.  However,  we  recommend  a  strategy  for  instantiation  into  the 
NOEM  using  meta-models  fit  by  way  of  the  proposed  method. 

As  an  alternative  to  the  above  mentioned  statistical  model,  we  also  developed  the 
MAMBU.  MAMBU  is  our  first  attempt  at  modeling  ‘affinity’  between  groups  within  the 
NOEM  framework.  It  is  particularly  useful  when  historical  data  (or  subject  matter  expert 
knowledge)  is  available  on  the  conditional  distributions  of  “important”  environmental  variables, 
given  the  behavioral  state  of  the  groups  under  study.  The  term  “important”  implies  that  only 
those  variables  that  are  assumed  to  affect  group  attitudes/behavior  are  considered,  perhaps 
determined  a  priori  via  an  appropriate  statistical  analysis.  The  approach  taken  by  MAMBU 
models  the  probability  distribution  assigned  to  a  group’s  behavioral  state  space  over  time  using  a 
discrete  time  Markov  chain,  and  updates  this  probability  distribution  whenever  new  infonnation 
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becomes  available  using  a  Bayesian  updating  procedure.  An  advantage  to  this  approach  relative 
to  the  hierarchical  modeling  approach  discussed  above  is  that  the  probability  distribution  of  the 
behavioral  state  space  of  the  groups  is  captured  at  each  time  t.  As  a  result,  action  sets  can  be 
mapped  to  the  behavioral  state  space  at  each  time  t  quite  easily  (which  is  desirable  within 
NOEM).  Additionally,  the  statistical  analysis  required  to  populate  MAMBU  is  rather 
straightforward,  relative  to  the  analysis  required  for  the  proposed  hierarchical  modeling 
technique. 
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3.0  METHODS,  ASSUMPTIONS,  AND  PROCEDURES 
3.1  Statistical  Model 

In  this  subsection,  a  novel  statistical  model  that  pennits  correct  testing  of  group-level  and 
individual-level  effects  on  group  and/or  individual-level  constructs  is  developed,  including 
model  specification,  estimation  and  inference. 

3.1.1  Model  Specification 

Consider  the  following  statistical  model 

y 'LJj+  x,7  +  W,./7 + \  b,  +  e, 

where 

•  y,.  denotes  an  ni  x  1  vector  of  responses  obtained  from  the  z'th  group. 

•  z;.  denotes  a  p  x  1  vector  of  group-level  covariate  values  for  the  ith  group. 

•  / 3  denotes  a  p  x  1  vector  of  unknown  group-level  effects. 

•  X(.  denotes  a  nj  x  k  matrix  of  individual-level  covariate  values  nested  within  the  ith 
group. 

•  y  denotes  a  k  x  1  vector  of  unknown  individual-level  effects. 

•  W,  denotes  a  nj  x  q  matrix  of  covariate  values  corresponding  to  group-level  by 
individual-level  interaction  effects. 

•  i]  denotes  a  q  x  1  vector  of  unknown  group-level  by  individual-level  interaction  effects. 

•  bj  is  a  scalar-valued  random  effect  and  is  assumed  to  follow  a  normal  distribution  with 
zero  mean  and  variance  crb.  Additionally,  the  b' s  are  assumed  to  be  independent  over 
the  index  i. 

•  Ei  is  the  error  term  corresponding  to  within  group  variation  and  is  modeled  as 
multivariate  nonnal  with  zero  mean  vector  and  variance-covariance  matrix  crl. 

where  ni  denotes  the  number  of  observations  (i.e.,  individuals)  sampled  from  the  ilh  group. 
Lastly,  it  is  assumed  that  the  covariance  between  b;  and  sj  is  zero.  This  is  a  common  and 
reasonable  assumption  in  practice. 

Note  that  for  the  proposed  model,  we  have 
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£(y,)=i,,^+x,r+w,»; 

and 

Var(y,)=V,=a1ll,i,i+^, 


which  implies  the  following  correlation  structure 

d  +  cr. 


Cov 


(■  •  ) 


i  =  i''J=J 
i  =  i’J  *  J 


[0  i^i 

suggesting  that  responses  measured  from  individuals  belonging  to  the  same  group  have 
covariance  of,  while  responses  measured  from  individuals  belonging  to  different  groups  have 
zero  covariance.  The  unknown  parameter  vectors  ( J3,  y,  and  rj),  as  well  as  the  unknown 
variance  components  ( of  and  of)  need  to  be  estimated  from  a  sample.  For  example,  one  might 
use  a  sample  of  survey  response  data  collected  from  the  individual  group  members. 

The  above  model  can  be  written  more  compactly  by 

y,=A,A  +  l  nbt  +  e, 


where  bj  and  s;  are  as  defined  above,  and 

A  |l  *  X,  W  | 

denotes  an  ni  x  r  dimensional  matrix,  where  r  =  p  +  k  +  q,  where 


X  =  \$  y  77] 

is  a  r  x  1  unknown  parameter  vector. 

A  critical  assumption  in  the  above  model  is  that  the  response  vectors  (i.e.,  y,.’s)  are 
independent,  each  following  a  multivariate  nonnal  distribution  with  mean  vector  //,  and 
variance-covariance  matrix  V;.  Unfortunately,  for  many  applications  involving  the  study  of 
groups,  this  assumption  can  be  grossly  violated  (e.g.,  Likert-scale  data).  As  a  result,  we  propose 
a  data-driven  approach  involving  a  power  transformation  on  the  original  response  variable  to 
“force”  the  data  to  appear  nonnal.  Subsequent  analysis  is  then  performed  in  the  transfonned 
domain,  and  then  transfonned  back  into  the  original  units  for  interpretation  purposes. 
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3.1.2  Model  Estimation 


Suppose  that  y;  is  as  assumed  in  the  above  model.  That  is,  the  y,  ’s  are  independently 
distributed  as  multivariate  normal  with  mean  vector  //(  and  variance-covariance  matrix  V(  .  Then 

given  the  observed  vector  of  responses  y  =  (y  ,,  y  2,  y^),  the  log-likelihood  function  for 

(f)  =  (f/i, ,  (f>2)=  ( <J2h ,  cr)  and  A  is  given  by 

i  Ouiy)= -2>g<M  -  X(y,  -  '(y,  -  A,A) 

rrn  <=i 

In  what  proceeds,  note  that  V,  can  be  written  as 

Vi  =  <t>lVa  +  </>2Vi2 

where  V(1  =  ln  ln  and  Vi2  =  In  .  We  will  first  differentiate  the  likelihood  function  with  respect 
to  the  unknown  parameters  (j)u  (u  =1,2),  or 


where 


QUIy) 

rrn 


i= 1 


iu 


A.=(y,-A4vr‘v,.vr‘0',-A,h 

Note  further  that  we  can  write 

KVr '¥,„)=  </>l  <°,ux  +  </>2®iu2 

where 

co.  =tr(\:l\.  V.'V  ) 

ium  \  1  iu  i  im  J 

for  m  =  1,  2 .  Therefore,  if  we  define  the  2  x  2  matrices  Q  =  {cojum  j  (/  =  1,...,7V),  then  the 
likelihood  equations  for  any  given  A  are 

N  N 

= Z  Pi 


where 


(yrA^Vr'l,  !,  V,-'(y,-V) 

.  (y,-A,hvr‘i,vrl(y,-A,q  . 
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and  thus  the  estimate  for  (f>  is  obtained  iteratively  from 

f N  YT n  ] 

K*)=  ZQ,  Za 

V,=i  J  \i= 1  J 

The  likelihood  equation  with  respect  to  /L  are  easily  shown  to  be 

|  N  N 

Z(yvr‘A,)  l  =  ZA,V:,y, 
i= 1  J  i=l 

so  that  the  estimate  for  /l  given  (j)  is 

%)=  t(A’Vr'A,)  ZKVr'y, 

-  ;=  1  J  ;=1 

Consider  the  case  where  the  response  data  is  non-normal.  Suppose  there  exists  a 
transformation  on  the  y,  ’s  such  that  the  transformed  y,’s  are  independent  and  follow 
multivariate  nonnal  distributions  with  mean  vectors  //,  and  variance-covariance  matrix  V(. 

In  this  effort,  we  consider  the  class  of  power  transformations 

v°-  —  1 

^ —  d*o 

e 

iog^v,;)  o  =  o 

where  0  denotes  the  transformation  parameter.  Since  the  transformed  y,  ’s  are  assumed  to  be 
independent  and  follow  multivariate  normal  distributions  with  mean  vector  //;  and  variance- 
covariance  matrix  V( ,  the  likelihood  function  for  the  untransformed  response  is  then 

N  n,  -  JV  N 

i(i,^s)=(e-i)SEiog«0-^Siog,|v,|-E(y,(«)-Alp'vrl(y,(«)-A^) 

rm  i=i  j= 1  1=1 

and  the  goal  is  to  find  the  values  of  6,  A,  and  (j)  that  maximizes  this  likelihood  function.  To 
accomplish  this,  one  can  perfonn  the  following  steps: 
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3.  Use  V,0"  and  A1'1  (9)  to  evaluate  Q,,  p,(0)  (i  =  1  ,...,Ar),  then  compute 


f  N  V 

Vn  3 

^(1)  = 

\i=l  J 

Xp,(e) 

Vh  J 

4.  Return  to  Step  2  and  iterate  until  some  convergence  criteria  is  met. 


Thus,  one  could  perform  the  above  steps  for  a  range  of  values  for  9,  each  time 
substituting  the  resulting  estimates  of  A  and  (f>  into  the  likelihood  function  given  above,  and 
retain  that  value  of  9  that  maximizes  this  function.  We  suggest  using  values  of  9  in  the 
interval  [-1,  1]  in  increments  of,  say,  0.50.  Values  in  this  set  include  the  inverse,  square-root, 
natural  log,  and  inverse  square-root  transfonnations,  as  well  as  the  case  of  no  transformation 
(i.e.,  9  =  1).  Values  of  9  outside  of  this  range  are  more  difficult  to  interpret  in  practice. 

Note  that  the  transformation  proposed  in  this  research  requires  y..  to  be  a  positive 


number.  However,  in  cases  where  ytj  is  negative,  one  can  include  a  positive  constant  c  so 
that  the  transformation  becomes 


yo(0’c) 


U+4-1 

9 

log  efyj+c) 


9*0 
9  =  0 


where  c  is  chosen  so  that  y..  +  c  >  0  for  all  i  and  j. 

Suppose  the  model  has  been  fit  using  the  iterative  procedure  outlined  above,  then  the 
fitted  values  in  the  original  units  are  given  by 


y  a  =  exP' 


logf(l  +  gA„i) 


0 


for  i  =  1  and  j  =  1  where  A.,  is  an  r  x  1  vector  of  covariate  values  corresponding  to  the 

jth  response  nested  within  the  ith  group. 


3,1.3  Model  Inference 


Suppose  one  is  interested  in  testing  the  hypotheses  H0 :  Ah  =  0  versus  Hx\  Ah  *  0 .  Since 

A  is  a  maximum  likelihood  estimate  for  A,  we  can  derive  an  approximate  test.  If  the  variance 
components  are  known,  it  is  easily  shown  that 
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IXvr'A, 


and  evaluating  V,  using  (f>  (i.e.,  maximum  likelihood  estimate  for  <f>),  we  have  the  following  test 
statistic 


which  is  asymptotically  standard  nonnal  under  H0.  Thus,  for  a  given  level  of  a  (i.e.,  type  I 
error  rate),  an  approximate  test  involves  computing  |z0|  and  comparing  to  the  upper  a/2  quantile 
of  the  standard  normal  distribution. 

The  model  developed  in  this  section  can  be  used  as  a  stand-alone  analysis  tool  in 
situations  where  the  researcher  is  interested  performing  correlational  studies  (via  observational 
data)  or  causation  studies  (via  a  designed  experiment)  between  individual  and/or  group  level 
outcomes  (or  responses,  constructs,  etc.)  and  one  or  more  characteristics  of  groups,  group 
members,  interacting  groups,  or  the  settings  within  which  groups  function. 

The  next  subsection  develops  an  alternative  approach  to  modeling  the  relationship 
between  group  constructs  and  important  variables.  This  approach  is  more  probabilistic  in  nature, 
and  is  especially  useful  when  interest  lies  in  mapping  a  group’s  behavioral  state  to  a  set  of 
possible  actions.  It  exploits  the  use  of  historical  infonnation  and  subject  matter  expert  opinion 
that  might  be  available  on  the  conditional  distribution  of  important  environmental  variables, 
given  the  observed  attitudes  and  behaviors  of  groups  under  study. 


3.2  Markov  Affinity  Model  with  Bayesian  Updates  (MAMBU) 

In  this  subsection  we  discuss  the  Markov  Affinity  Model  with  Bayesian  Updates 
(MAMBU).  MAMBU  is  our  initial  efforts  to  modeling  changes  in  ‘between-group’  affinities 
within  the  NOEM  framework,  although  its  use  extends  beyond  that  of  NOEM.  It  is  anticipated 
that,  for  prediction  purposes,  a  statistical  model  will  be  employed  in  helping  the  researcher  to 
detennine  which  environmental  factors  most  highly  correlate  with  some  group  construct  of 
interest,  e.g.,  affinity,  alliance,  prejudice,  trust,  etc.  The  objective  here  is  to  detennine  a  smaller 
subset  of  factors  (from  some  larger  pool)  that  are  useful  in  explaining  variations  in  group 
constructs.  To  study  these  constructs,  one  might  examine  the  influence  of  both  group-level 


variables  (e.g.,  unemployment  rate,  crime  rate,  violent  death  rate)  and  individual-level  variables 
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(e.g.,  age,  sex,  occupation)  on  these  constructs  using  the  analysis  approach  discussed  in  the 
previous  subsection.  However,  this  problem  is  simplified  within  the  NOEM  framework  since 
there  are  no  individual-level  variables;  rather,  only  group-level  variables  exist.  As  a  result, 
standard  regression  methods  can  be  used  to  populate  MAMBU  so  long  as  all  variables  included 
in  the  study  are  aggregated  at  the  group-level.  However,  NOEM’s  emphasis  on  only  one  level  of 
analysis  (i.e.,  group-level)  may  be  a  severe  limitation.  This  is  because  both  individual-  and 
group-level  processes  occur  in  group  settings  (Kenny  &  Voie,  1985),  (Moritz  &  Watson,  1998), 
thus  NOEM  may  not  permit  the  precise  modeling  of  group  dynamics  since  it  does  not  retain 
information  at  the  individual  level. 

The  goal  of  MAMBU  is  to  assign  a  probability  distribution  to  the  behavioral  state  space 
of  a  group  towards  some  construct  of  interest  (e.g.,  attitude  towards  immigrants,  support  for 
indigenous  government,  etc)  as  a  function  of  changes  in  relevant  environmental  variables  (e.g., 
unemployment  rate,  rate  of  grievances  redressed,  etc.).  For  example,  suppose  that  a  regional 
populace  is  under  study,  and  interest  lies  in  determining  how  the  population’s  affinity  towards 
the  indigenous  government  changes  as  a  result  of  changes  in  important  variables  (which  are 
perceived  by  the  population  to  be  controlled  by  the  indigenous  government).  In  what  follows, 
the  technical  development  of  MAMBU  is  discussed. 

Let  AUj)  =  jSfi/)l,...,S(i/>i  J  denote  the  behavioral  state  space  of  group  i  towards  group  /, 

where  denotes  the  [TrT  behavioral  state  of  group  i.  Let  Xt  =  [XU,...,XA7]  denote  a  k- 
dimensional  vector  representing  the  state  of  the  environment  at  time  t.  We  desire  a  probability 
distribution  over  the  state  space  A(j)  for  each  i,j  =1  however,  conditioned  on  the  observed 
state  of  the  environment  at  time  t,  or  X r  Thus,  we  have  in  groups  in  the  environment,  k 
environmental  variables,  and  ni  behavioral  states  corresponding  to  group  i.  Let  p(ij)t  denote  the 
probability  distribution  assigned  to  A(ij)  at  time  t,  and  let  Q(i/)  denote  a  transition  probability 
matrix  corresponding  to  the  behavioral  state  space  A(ij).  Given  an  initial  state  probability 
distribution  over  A(i/),  say  pUJ)Xn  the  prior  probability  distribution  over  A(j)  at  time  t  is  computed 
from 

\Pt(S(u)t  )1=  P(ij),t  =  P(ij)J-lQij) 
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for  t=\,2,. . suggesting  our  prior  knowledge  with  respect  to  how  a  group  transitions  between  its 
behavioral  states  over  time  is  adequately  modeled  by  a  discrete-time  Markov  chain.  Suppose 
that  at  time  t  we  observe  the  state  of  the  environment  Xt,  then  the  posterior  (i.e.,  updated) 
probability  distribution  over  A(i/)  at  time  t  is  computed  by 

Puj >.,  =  |>, (■?<!». \x,)] 

where  by  combining  Bayes’  Theorem  and  the  law  of  total  probability  we  have 

NW(y)l  \^(>~  n, 

\Sut»)P,(Smr) 

rm 

Note  that  if  the  state  of  the  environment  remains  constant  over  time  (i.e., 

Xt~X  for  all  t),  the  probability  distribution  assigned  to  the  behavioral  state  space  of  the  group 
will  eventually  reach  a  steady  state.  This  follows  directly  from  the  properties  of  discrete-time 
Markov  chains  e.g.,  see  (Kulkarni,  2000).  However,  once  the  group  is  perturbed  by  changing 
environmental  conditions  (assuming  the  behavioral  state  of  the  group  is  affected  by  these 
changes),  the  probability  distribution  assigned  to  the  state  space  again  becomes  transient. 

The  computation  of  l^)i  )  can  be  messy  and  complicated,  depending  on  the  type  of 
measurements  contained  in  Xt,  as  well  as  whether  or  not  covariance  exists  between  the  different 
variables  in  X r  If  the  elements  of  Xt  are  all  continuous  variables,  one  approach  is  to  let 

exP  -0.5(X,  -pm  ^  (X,  -pm  ) 

PAX,  |  S(ij) |  )  =  7TTTI72 TT  Jii 

rm  ^  '  1  (y)' 1 

which  is  the  multivariate  normal  density  function.  Note  that  and  ^  ,1  denote  the  mean 
vector  and  covariance  matrix,  respectively,  of  the  environmental  state  vector  Xt,  conditional  on 
the  behavioral  state  . 

There  are  other  fonns  of  data  besides  continuous  data.  For  example,  there  are  also  counts 
and  proportions.  Suppose  the  elements  in  X t  consist  of  q  counts  (e.g.,  number  of  violent 
incidents  in  a  region,  number  of  civilian  casualties,  etc.),  then  a  reasonable  approximation  of 
I I  S(ij)l  )  is  given  by  the  product  of  Poisson  mass  functions,  or 
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-rrAi f)«i  exp{  Ay)«i  } 

p(x,  1 5(W ) = n- — y , 
rm  “=i  ur 

where  p|^)„i  denotes  the  mean  count  rate  of  the  u"‘  variable  conditioned  on  the  [Iff  behavioral 
state  contained  in  A(i/).  Suppose  that  Xt  consists  of  q  proportions,  i.e.,  a  number  between 
0  and  1.  Then  an  approximation  to  I  S(iJ)]  )  can  be  obtained  from  the  product  of  beta  density 
functions 


P(Xt 

rm 


„  .  =  rr  r(a»'/)i  +  Pan i  ) 

,SI1  .ihvfi/fa) 


Am 


where  >  0  and  >  0  denote  parameters  corresponding  to  the  shape  and  scale  of  the 
probability  distribution  of  Xt,  and,  f  denotes  the  Gamma  function  defined  generally  as 


T(z)  =  |o  tz~'  exp (~t)dt 

Note  that  if  the  measures  in  Xt  consists  of  a  mixture  of  continuous,  count  and 
proportions  data,  then  a  reasonable  approximation  to  l^)i  )  can  be  obtained  by  taking  the 
product  of  the  marginal  density  functions  corresponding  to  each  measure.  For  example,  let  Xt 
denote  an  a-dimensional  vector  of  continuous  variables,  Zt  denote  a  /:>-dimcnsional  vector  of 
count  variables,  and  Ut  a  c-dimensional  vector  of  proportion  variables.  Then  an  approximation 
to  |  S(ij)l  )  can  be  obtained  from 


Of  course,  in  order  to  compute  these  probabilities,  we  must  have  some  knowledge  of  the 
parameters  (i.e.,  ju,  X,  A,  a,  and  //)  for  each  behavioral  state  space  considered.  This  knowledge 
is  presumed  to  come  from  historical  data  (or  subject  matter  expert  knowledge  when  empirical 
data  is  lacking).  Standard  approaches  to  estimation  and  inference  (i.e.,  method  of  maximum 
likelihood,  two-sample  /-tests,  etc.)  can  be  used  to  estimate  and  draw  inference  on  the  unknown 
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parameters  from  observed  data.  Standard  statistical  approaches  are  widely  available  in  a  variety 
of  software  packages. 

To  obtain  an  overall  measure  of  group  affinity  (or  whatever  construct  is  being  modeled) 
at  any  time  t,  one  can  compute  an  expectation  across  the  distribution  assigned  to  the  behavioral 
state  space.  That  is,  suppose  that  the  cardinality  of  group  /' s  state  space  is  odd-numbered,  then  at 
each  time  t,  group  i's  affinity  toward  group  j  is  computed  as 

‘=0)'),/  —  P(ij),rV 

where  p{ij)  t  is  the  posterior  probability  distribution  over  the  behavioral  state  space  A(ij)  at  time  t, 
and 


where  |_  J  denotes  the  “floor”  function.  Note  that  is  an  expected  value  and  is  contained  in 
the  set  [-1,  1]. 

For  example,  suppose  that  we  again  consider  a  regional  populace,  and  its  affinity  toward 
the  regional  government.  Suppose  that  we  consider  three  behavioral  states  for  the  regional 
populace:  1)  Unsupportive,  2)  Neither  Supportive  nor  Unsupportive,  and  3)  Supportive  of  the 
regional  government.  Then  an  affinity  score  close  to  -1  would  suggest  that  the  population  is  in 
the  ‘Unsupportive’  state.  On  the  other  hand,  an  affinity  score  close  to  1  would  suggest  that  the 
population  is  in  the  ‘Unsupportive’  state. 

As  a  final  addition  to  the  model,  it  is  important  to  consider  the  fact  that  individuals  (and 
hence,  groups)  retain  memory  of  past  events.  Thus,  MAMBU  accounts  for  this  by  geometrically 
weighting  the  posterior  probability  distributions  assigned  to  the  behavioral  state  space  as  they 
age  with  time.  As  the  weights  decrease  geometrically,  so  does  the  contribution  of  the  posterior 
distributions  for  which  the  weights  are  assigned.  The  posterior  probability  distribution  over  the 
behavior  state  space  at  time  t  is  then  computed  from 

where  t//(  e  (0,  1]  for  all  i  and  denotes  the  weighting  coefficient  corresponding  to  group  i.  Larger 
values  for  t//  indicate  that  group  i  applies  more  weight  to  current  environmental  conditions,  and 
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less  weight  to  past  events  or  conditions.  On  the  other  hand,  smaller  values  for  y/i  suggest  that 
group  i  places  less  weight  on  current  conditions  and  more  on  past  events  or  conditions.  The 
value  for  t//;  is  likely  to  be  set  by  subject  matter  experts  knowledgeable  about  group  i. 
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4.0  RESULTS  AND  DISCUSSION 


In  this  section  we  demonstrate  application  of  the  proposed  models  developed  in  the 
previous  section  and  provide  some  additional  discussion.  We  use  simulated  data  sets  in  order  to 
control  ‘ground  truth’.  We  also  discuss  the  use  of  these  models  within  the  NOEM  framework. 

4.1  Application  of  Statistical  Model 

To  illustrate  the  proposed  statistical  model,  consider  the  following.  Suppose  that  a 
correlation  (as  opposed  to  causation)  analysis  is  to  be  conducted  using  observational  data  with 
the  objective  of  detennining  whether  group  unemployment  rate  and  individual  skill  level  affects 
attitudes  towards  immigrants.  Note  here  that  ‘unemployment  rate’  is  a  group  aggregate  measure, 
while  skill  level  is  an  individual  characteristic.  Thus,  we  are  dealing  with  cross-level  factors  in 
this  example.  Suppose  that  response  data  consists  of  five-level  Likert-item  responses  to  some 
statement  construct  regarding  immigrants.  Suppose  further  that  8  groups  are  randomly  selected 
from  a  larger  population  of  groups  available  for  study,  and  within  each  group,  40  individuals  are 
randomly  selected  for  observation.  For  each  group  selected,  suppose  that  the  unemployment  rate 
for  that  group  was  recorded  (low  or  high),  as  well  as  the  skill  level  (low  or  high)  of  each 
individual  surveyed  within  the  groups.  Suppose  that  each  individual  was  asked  to  respond  to  the 
following  statement  construct  on  a  five-point  Likert-item:  “Immigrants  are  a  threat  to  my 
economic  security.”  Note  that  the  five-point  Likert-item  has  response  options  “Strongly 
Disagree”,  “Disagree”,  “Neither  Agree  or  Disagree”,  “Agree”,  and  “Strongly  Agree”.  The 
simulated  data  sets  are  shown  in  histogram  format  in  Figure  1. 
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Figure  1:  Likert-item  Responses  from  Eight  Groups,  each  with  40  Individuals 

Each  histogram  shown  in  Figure  1  represents  simulated  responses  obtained  from 
members  of  the  same  group.  The  true  simulated  effect  for  unemployment  rate  was  0.8725. 
Additionally,  the  true  simulated  effects  due  to  skill  level  and  the  unemployment  rate  x  skill  level 
interaction  were  -0.1025  and  0.4550,  respectively.  A  random  effect  was  also  simulated  to  induce 
additional  unexplainable  variation  at  the  group  level.  Using  this  data,  we  demonstrate  our 
proposed  analysis  approach.  The  top  plot  in  Figure  2  shows  a  plot  of  the  log-likelihood  function 
(given  earlier)  over  a  range  of  values  for  the  transformation  parameter,  6.  Notice  that  the 
recommended  transfonnation  on  the  response  is  the  square -root  transformation  since  <9  =  1/2. 
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Translormation  Parameter 


Figure  2:  Plot  of  Log-likelihood  Function  versus  Transformation  Parameter 

The  numerical  results  of  the  analysis  are  shown  in  Table  1,  including  estimated 
coefficients,  standard  errors,  and  p-values.  Notice  that  if  unemployment  rate  is  high,  then  Table 
1  suggests  that  the  proportion  of  individuals  who  agree  with  the  statement  construct  “Immigrants 
are  a  threat  to  my  economic  security”  increases.  Similarly,  Table  1  suggests  that  an  increase  in 
individual  skill  level  is  often  accompanied  by  an  increase  in  disagreement  with  the  above 
statement  construct.  Lastly,  we  see  that  the  interaction  effect  between  the  two  variables  is  also 
deemed  statistically  significant.  Of  course,  these  results  are  expected  since  the  data  were 
simulated  and  the  true  simulated  effects  are  known.  Note  that  in  Table  1  the  standard  error  (S.E.) 
of  the  estimated  coefficient  for  unemployment  rate  is  larger  than  the  standard  errors  of  the 
estimated  coefficients  for  skill  level  and  the  unemployment  rate  x  skill  level  interaction.  This  is 
an  expected  result  since  we  only  have  eight  observations  at  the  group-level,  compared  to  320 
observations  at  the  individual  level.  Since  we  have  a  small  number  of  samples  at  the  group  level, 
the  /-distribution  with  6  degrees  of  freedom  was  used  as  the  reference  distribution  for 
unemployment  rate,  while  the  standard  normal  distribution  was  used  as  the  reference  distribution 
for  factors  at  the  individual  level.  Note  also  that  the  estimates  for  the  variance  components  are 
of  =0.015  and  of  =  0.5594,  suggesting  most  of  the  unexplainable  variability  is  due  to  within- 
group  differences.  At  this  point,  the  researcher  might  seek  other  factors  at  the  individual  level  in 
attempts  to  mitigate  some  of  the  unexplainable  variability  contained  at  this  level. 
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Table  1:  Effect  Estimates  and  Estimated  Standard  Errors  of  Effect  Estimates 


Source 

Coeff 

S.E. 

p-value 

Unemployment  Rate 

0.2812 

0.0601 

0.0017 

Skill  Level 

-0.0946 

0.0422 

0.0125 

Unemploy  x  Skill  Level 

0.0713 

0.0422 

0.0455 

Since  the  analysis  was  conducted  in  the  transformed  units,  we  need  to  transfonn  the 
model  back  into  the  original  units  for  prediction  purposes.  Doing  so  yields  the  following 
prediction  equation,  where  z(  and  xy  denote  observed  or  known  values  for  group  V  s 
unemployment  rate  and  individual  f  s  skill  level  (where  individual  j  is  a  member  of  group  /): 

y,i  =  E(yy  \zi,xiJ')=ex& 

where  j)..  denotes  the  expected  response  on  a  five-point  Likert-item  to  the  statement  construct 
“Immigrants  are  a  threat  to  my  economic  security,”  as  a  function  of  group  unemployment  rate 
and  individual  skill  level.  It  is  important  to  note  that  using  the  proposed  approach,  inference  can 
be  drawn  on  the  entire  population  of  groups,  as  opposed  to  only  the  groups  considered  in  the 
study,  since  it  is  assumed  that  each  group  considered  in  the  study  was  randomly  selected  from  a 
larger  population  of  groups  existing  in  the  OE.  Suppose  that  the  entire  population  of  groups  had 
been  sampled,  then  in  such  a  case  there  is  no  random  effect  due  to  groups,  and  thus,  one  can  use 
standard  multiple  linear  regression  methods  to  conduct  the  analysis. 

Suppose  that  we  used  standard  methods  to  conduct  the  analysis  instead  of  the  proposed 
strategy.  In  this  particular  case,  one  can  apply  the  Box-Cox  transfonnations  (Box  &  Cox,  1964) 
to  find  the  appropriate  power  transfonn  on  the  response.  For  this  example,  applying  the  Box- 
Cox  transformations  yields  6  =  1/2,  so  that  the  recommended  transfonnation  is  again  the  square- 
root  transformation.  The  numerical  results  are  given  in  Table  2.  Notice  that  when  using 
standard  regression  in  the  presence  of  a  random  group  effect,  the  factor  unemployment  rate 
appears  to  be  highly  significant,  while  the  remaining  factors  at  the  individual-level  go  undetected 
(say,  at  the  0.05  level  of  significance).  This  is  a  major  disadvantage  of  using  standard  statistical 
methods  on  grouped  responses.  In  particular,  the  type  II  error  associated  with  effects  studied  at 


log, (j  + 0.5(l .25 +  0.28z,.  -0.09x,y  +  0.07z;x!;/^ 
_ 
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the  group  level  increases,  and  the  power  to  detect  factor  effects  studied  at  the  individual  level 
decreases. 

Table  2:  Effect  Estimates  and  Estimated  Standard  Errors  of  Effect  Estimates  using 

Standard  Multiple  Linear  Regression 


Source 

Coeff 

S.E. 

p-value 

Unemployment  Rate 

0.2813 

0.0679 

<  0.0000 

Skill  Level 

-0.0951 

0.0679 

0.0809 

Unemploy  x  Skill  Level 

0.0699 

0.0679 

0.1518 

The  model  proposed  in  this  section  is  a  powerful  statistical  technique  for  detennining 
factors  that  can  explain  the  variability  in  one  or  more  group  constructs  of  interest,  to  include 
attitudes  and  behaviors  towards  other  (human)  groups.  Since  groups  share  commonalities  on  a 
variety  of  dimensions,  e.g.,  goals,  ethnicity,  culture,  beliefs,  values,  etc.,  this  complicates  the 
resulting  statistical  analysis,  particularly  when  the  researcher  is  interested  in  assessing  the 
significance  of  both  group-level  and  individual-level  factors.  The  consequences  of  analyzing 
grouped  data  using  standard  ANOVA  or  regression  methods  can  be  severe,  often  resulting  in 
misleading  results. 

With  respect  to  the  NOEM,  the  proposed  model  can  be  used  in  establishing  empirical 
relationships  between  factors  and  groups  modeled  within  the  NOEM.  Using  the  proposed 
model,  one  can  study  the  ‘forces’  acting  on  a  population/group  of  interest  that  shapes  its  attitude 
and  behavior.  Once  a  fitted  model  is  obtained,  it  can  be  implemented  into  the  NOEM  in  the  fonn 
of  a  meta-model.  To  demonstrate,  consider  the  hypothetical  example  given  above  where  group 
attitudes  towards  immigrants  were  studied  as  a  function  of  unemployment  rate  and  individual 
skill  level.  Since  individual-level  variables  are  not  modeled  within  the  NOEM  framework,  our 
focus  might  be  on  predicting  group  attitudes  and  behavior  towards  another  group  (e.g., 
immigrants)  as  a  function  of  group  unemployment  rate.  However,  the  above  fitted  model  still 
can  be  used  as  a  prediction  equation  by  working  with  partitions  of  group  i  on  the  basis  of  skill 
level.  That  is,  group  i  can  be  partitioned  into  two  groups,  one  have  a  high  skill  set  and  another 
having  a  low  skill  set.  Attitudes  and  behavior  of  groups  with  low  skill  sets  and  high  skill  sets 
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can  then  be  predicted  as  functions  of  group  unemployment  rates.  For  groups  with  high  skill 
sets,  the  prediction  equation  becomes 

Jlog,(l.58  +  0.175z,)l 

-v'=expj - 51 - j 

and  similarly  for  groups  with  low  skill  sets 

J  log,, (1.67  +  0. 105z,)| 

>-,=expj - ^ - } 

For  example,  consider  individuals  in  group  i  having  a  high  skill  set,  then,  based  on  the 
above  analysis,  if  group  unemployment  is  low  (i.e.,  -1),  we  expect  the  average  response  on  the 
Likert-item  to  lie  somewhere  between  “Strongly  Disagree”  and  “Disagree”.  On  the  other  hand, 
if  group  unemployment  is  high,  we  expect  the  average  response  on  the  Likert-item  to  lie 
somewhere  between  “Neither  Agree  or  Disagree”  and  “Agree”.  We  find  similar  interpretations 
for  members  of  group  i  having  a  low  skill  set. 

4.2  Applications  of  MAMBU 

In  each  of  the  examples  that  follow,  we  describe  hypothetical  examples  involving  a 
regional  populace  and  a  governing  force.  We  should  note  that  although  the  following  examples 
involve  describing  the  affinity  that  a  regional  populace  has  towards  another  group,  MAMBU  is 
certainly  not  limited  to  the  modeling  of  regional  populations  and  their  attitudes  towards  a 
governing  force.  In  fact,  we  believe  that  MAMBU  has  application  in  a  number  of  different 
areas.  For  example,  MAMBU  might  be  implemented  within  a  combat  simulation  model  in  order 
to  model  the  process  of  assessing  battle  damage  at  the  group  or  individual  level.  Additionally,  it 
might  find  application  in  ‘what  if  studies  involving  the  implementation  of  new  policies. 

4.2.1  Example  with  Continuous  Environmental  Variables 

Suppose  that  we  are  interested  in  modeling  changes  in  the  ‘affinity’  that  a  regional 
populace  has  toward  the  governing  force  over  time.  Let  us  consider  the  following  behavioral 
state  space 

A(l2)  =  [Unsupportive,  Neither  Supportive  or  Unsupportive,  Supportive] 

Suppose  that  we  consider  two  continuous  environmental  variables: 
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1 .  Units  of  potable  water  supplied  to  the  region  at  time  t  (Xu) 

2.  Units  of  electric  power  supplied  to  the  region  at  time  t  (X2() 

Suppose  that  historical  observations  exist  on  Xl  and  X,  over  some  finite  time  horizon  where 
the  regional  populace  was  known  to  be  ‘unsupportive’,  ‘neither  supportive  or  unsupportive’  and 
‘supportive’  towards  the  regional  government,  perhaps  detennined  by  analysis  of  public  opinion 
polls  over  time.  Figures  3-5  show  historical  (i.e.,  simulated  for  the  purpose  of  this  example) 
values  for  X]  andX,  assumed  to  have  been  jointly  observed  over  a  total  of  100  time  units,  and 
under  the  perception  that  the  regional  population  was  ‘Unsupportive”,  “Neither  Supportive  or 
Unsupportive”,  and  “Supportive”  of  the  regional  government,  respectively. 


Units  of  Potable  Water  Delivered 


Figure  3:  Jointly  Observed  Values  of  XI  and  X2  when  Population  is  Perceived  to  be 

'Unsupportive'  of  the  Governing  Force 
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Units  of  Potable  Water  Supplied 


Figure  4:  Jointly  Observed  Values  of  XI  and  X2  when  Population  is  Perceived  to  be 
'Neither  Supportive  of  Unsupportive'  of  Governing  Forces 


Units  of  Potable  Water  Delivered 


Figure  5:  Jointly  Observed  Values  of  XI  and  X2  when  Population  is  Perceived  to  be 

'Supportive'  of  Governing  Forces 
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Using  the  data  shown  in  Figures  3-5,  we  compute  maximum  likelihood  estimates  (under 
multivariate  nonnal  theory  model  assumptions)  of  the  mean  vectors  and  covariance  matrices 
corresponding  to  each  of  the  behavioral  states  using  standard  statistical  software.  These  are 
given  by: 

9.9931  2.1967 

Mm; unsupp1  =[6.7857,  95.0406]'  and  S(12);Unsupp,  =  ^  j  957  9  0597 


=  [10.0968,  100.368]'  and 


md(  12),'  Neither' 


10.1014  1.5380 
_  1.5380  8.8726_ 


5.8472  2.0341 

Am'Supp'  =  [14.841,  110.054]'  and  S(12);Supp,  =  20341  6  4130 

In  order  to  assess  whether  or  not  there  is  a  statistical  difference  between,  e.g.,  the  mean  units 
of  potable  water  delivered  when  the  populace  was  observed  to  be  ‘unsupportive’  and  the  mean 
units  of  potable  water  delivered  when  the  populace  was  observed  to  be  ‘neither  supportive  or 
unsupportive’,  one  can  use  standard  statistical  methods  for  making  multiple  comparisons  given 
in,  e.g.,  (Wu  &  Hamada,  2000)  or  (Montgomery,  2005). 

For  example,  suppose  we  want  to  detennine  if  there  is  a  difference  between  the  mean  units  of 
potable  water  delivered  when  the  populace  was  assumed  to  be  in  the  ‘unsupportive’  behavioral 
state,  and  the  mean  units  of  potable  water  delivered  when  the  populace  was  assumed  to  be  in  the 
‘neither  supportive  or  unsupportive’  state.  Using  standard  methods  for  multiple  comparisons 
involves  computing  the  /-statistics 

for  all  i,j  ( i  ^  j),  where  x.  and  x .  denote  sample  averages  and  .sy  and  s2j  denote  sample 
variances  corresponding  to  the  ith  and  jth  behavioral  states,  respectively,  and 

(//,  -  1>;  +  (rij  l); 

s  = - - - - — 

p  ri:  +  n .  -2 

1  J 
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denotes  the  pooled  estimator  of  the  unexplainable  error  cr  (i.e.,  the  error  source  not  explainable 
by  the  populace’s  behavioral  state).  Note  also  that  ni  and  n  .  denote  the  number  of  samples 

corresponding  to  behavioral  states  i  and  j,  respectively.  The  test  is  conducted  by  comparing  ttj 
to  critical  values  of  the  t  distribution  with  nt  +  n  J  -2  degrees  of  freedom.  To  better  control  the 

overall  type  I  error,  one  can  alternatively  use  Bonferonni  or  Tukey  critical  values,  e.g.,  see  (Wu 
&  Hamada,  2000),  (Montgomery,  2005). 

Note  that  in  order  to  pool  the  variances,  it  is  required  that  the  variance  of  X  be 
homogeneous  across  the  different  behavioral  states.  For  the  example  presented  here,  although 
the  assumption  of  constant  variance  appears  to  be  valid  across  the  ‘unsupportive’  and  ‘neither 
supportive  or  unsupportive’  states  (for  both  variables  Xl  and  X2),  the  variances  of  the 
observations  observed  on  the  variables  when  the  populace  was  perceived  to  be  in  the 
‘supportive’  state  appear  to  be  smaller.  If  the  variances  cannot  be  assumed  constant  across 
behavioral  states,  then  an  approximate  test  involves  computing 


and  comparing  ttj  to  the  t  distribution  with  degrees  of  freedom 


n,+ 1  n,  +1 

1  J 


For  this  example,  we  have  a  total  of  3  comparisons  to  make:  tl2,  tu,  and  t23  Since  we 
have  100  observations  on  each  variable  for  each  behavioral  state,  the  test  statistic  tn  (for  each 
variable)  is  compared  to  the  t  distribution  with  198  degrees  of  freedom,  while  the  test  statistics 
tn  and  t23  (for  each  variable)  are  compared  to  the  /-distribution  with  approximately  v  =  1 88 
degrees  of  freedom.  Note  that  since  the  degrees  of  freedom  for  each  test  is  large  in  this  case,  one 
can  approximate  the  critical  values  of  the  test  rather  precisely  using  the  standard  normal 
distribution.  Doing  so  we  find  that,  for  the  variable  X] ,  we  have  tn  =  -7.41,  tu  =  -20.23  and 

*23  =  -1 1  -89 ,  and  at  the  a  =  0.05  level  we  see  that  ttj  >  z0  025  =  1 .96  for  all  i  and  j  ( i  *j)- 
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Repeating  this  procedure  for  the  variable  X2  we  obtain  similar  results.  The  results  of  the 
multiple  comparison  analysis  provide  justification  for  inclusion  of  the  environmental  factors  (i.e., 
potable  water  and  electric  power )  in  the  model.  It  provides  empirical  evidence  that  the  levels  of 
the  variables  of  interest  are  in  fact  different,  depending  on  the  behavioral  state  of  the  group. 

If  we  examine  the  parameter  estimates,  note  that  it  only  takes  a  loss  of  approximately  3 
units  of  potable  water  and  5  units  of  electric  power,  on  average,  before  the  population  transitions 
from  the  ‘neither  supportive  or  unsupportive’  to  ‘unsupportive’  states.  However,  it  takes 
upwards  of  5  additional  units  of  potable  water  and  10  additional  units  of  electric  power,  on 
average,  before  the  population  transitions  from  the  ‘neither  supportive  or  unsupportive’  to 
‘supportive’  states.  Also,  as  noted  previously,  the  variances  of  X,  and  X,  are  much  smaller  when 
the  population  is  in  the  ‘supportive’  state.  The  interpretation  of  this  non-constant  variance  is  that 
the  population  has  less  tolerance  to  variations  in  the  amounts  of  potable  water  and  electric  power 
supplied  by  the  government.  That  is,  for  the  population  to  maintain  its  support  for  the 
government,  it  expects  to  receive  roughly  15  units  of  potable  water  and  110  units  of  electric 
power  on  a  consistent  basis. 

To  completely  populate  MAMBU,  we  also  require  a  transition  probability  matrix  for  the 
group  under  consideration,  as  well  as  an  initial  state  probability  distribution  vector.  The 
transition  matrix  for  this  example  was  chosen  to  be 

0.95  0.04  0.01 
Q(12)=  0.25  0.60  0.15 
0.15  0.25  0.60 

and  the  initial  state  probability  vector  is  pa2)0  =  [1,  0,  0],  suggesting  that  initially,  the  regional 
population  is  unsupportive  of  the  government.  Note  that  the  transition  probability  matrix 
represents  our  prior  knowledge  with  respect  to  the  likelihood  of  the  populace  transitioning 
between  behavioral  states.  For  example,  given  that  the  population  is  currently  in  the 
‘unsupportive’  state,  what  is  the  probability  that  the  population  will  transition  into  the 
‘supportive’  state  at  the  next  point  in  time?  For  this  example,  Q(12)  was  selected  on  the  basis  of 

the  steady-state  distribution  detennined  from  the  transition  probability  matrix,  which  is 

p  =  [0.81 13,  0.1225,  0.0662] 

suggesting  that  at  any  given  time,  roughly  81%  of  the  population  will  lie  in  the  ‘unsupportive’ 
state.  It  should  be  noted  that  the  settings  for  the  elements  in  Q(12)  are  not  crucial  since  they 
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represent  our  prior  knowledge.  If  no  prior  information  is  available,  a  reasonable  setting  for  Q( 


is 


(12) 


0.34 

0.33 

0.33 

0.34 

0.33 

0.33 

0.33 

0.33 

0.33 

so  that  the  steady-state  probability  vector  is  approximately  uniform  across  the  behavioral  state 
space.  Additionally,  if  the  initial-state  probability  distribution  vector  is  unknown,  one  can  use 
p{ i2)j0  =  [0.34,  0.33,  0.33] ,  which  is  again  approximately  uniform.  A  uniform  distribution  across 
the  behavioral  state  space  implies  that  the  populace  can  be  in  any  of  the  three  states  with  equal 
probability.  Lastly,  with  respect  to  the  geometric  weighting,  we  assumed  y/  =  0.25,  implying 
that  the  population  places  75%  of  their  weight  on  past  events  and  only  25%  on  current  events. 

Figure  6  shows  the  output  of  running  MAMBU  over  a  length  of  365  time  periods.  The 
values  of  X \  and  X2  were  simulated  from  a  multivariate  normal  distribution  with 
E(Xl)  =  10,  £(X2)  =  100  and 


2  = 


5 

2.5 


2.5 

5 


Time  Index 


Figure  6:  Affinity  Profile  of  Regional  Populace  Over  Time.  Note  that  E(X1)=10  and 

E(X2)=100 
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Notice  that  in  Figure  6  that  by  delivering  only  an  average  of  10  units  of  potable  water  and 
100  units  of  electric  power,  the  affinity  of  the  population  towards  the  government  is  never 
positive.  This  is  because  the  population  applies  more  weight  to  electric  power  than  to  potable 
water,  as  determined  by  the  estimated  variance-covariance  matrices  of  Xx  and  X2  conditioned  on 
the  behavioral  state  spaces.  If  the  average  units  of  potable  water  remain  constant  at  10  units, 
then  an  average  of  about  103  units  of  electric  power  will  bring  the  population  to  a  ‘neutral’  state. 
This  is  shown  in  Figure  7. 


Figure  7:  Affinity  Profile  of  Regional  Populace  Over  Time.  Note  that  E(X1)=10  and 

E(X2)=103 

Suppose  that  the  average  units  of  potable  water  delivered  remains  at  10  and  the  average 
units  of  electric  power  delivered  increases  over  time.  Then  Figure  8  shows  increases  in  affinity 
as  a  result  of  improved  environmental  conditions  (i.e.,  increased  electrical  power  to  the  region). 
Similar  results  are  shown  in  Figure  9,  where  although  a  decrease  in  average  potable  water  is 
observed,  a  large  increase  in  the  average  units  of  electricity  delivered  causes  a  large  increase  in 
the  population  affinity  towards  the  government. 
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Figure  8:  Affinity  Profile  of  Regional  Populace  Over  Time.  Note  that  E(X1)=10  and  E(X2) 

is  Increasing  Over  Time 
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Figure  9:  Affinity  Profile  of  Regional  Populace  Over  Time.  Note  that  both  XI  and  X2 

Exhibit  Change  Points 
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This  example  illustrates  the  utility  of  MAMBU  for  implementation  into  the  NOEM.  It 
permits  the  capture  of  changes  in  affinity  due  to  changes  in  the  state  of  the  environment  at  any 
given  time.  As  environmental  conditions  are  updated  in  NOEM,  so  are  the  affinities  between 
groups  modeled  within  NOEM  via  MAMBU.  Further,  MAMBU  is  empirically  grounded  and 
fairly  easy  to  populate  so  long  as  empirical  data  exists.  If  empirical  data  does  not  exist,  then  it  is 
recommended  that  MAMBU  be  populated  by  subject  matter  experts. 

4.2.2  Example  with  Count  Environmental  Variables 

For  this  example,  suppose  we  are  interested  in  the  affinity  between  a  regional  populace  in 
Iraq  and  United  States  (US)  Forces  in  the  region.  For  this  example,  the  state  of  the  environment 
at  time  t  is  measured  by  the  following: 

1.  Number  of  violent  incidents  (Aw) 

2.  Number  of  civilian  casualties  ( Xlt ) 

3.  Number  of  enemy  casualties  (X3/) 

Suppose  that  if  the  number  of  violent  incidents  or  the  number  of  civilian  casualties  increases, 
the  regional  populace’s  stance  toward  the  US  Forces  will  approach  hostility.  Further,  if  the 
number  of  enemy  casualties  decreases,  so  will  the  regional  population’s  support  for  the  US 
Forces.  On  the  other  hand,  decreases  in  the  number  of  violent  incidents  and  the  number  of 
civilian  casualties  will  gain  some  support  for  US  Forces  in  the  region.  Additionally,  an  increase 
in  the  number  of  enemy  casualties  will  also  gain  support  for  US  Forces.  Of  course,  all  of  these 
effects  should  be  verified  by  subject  matter  experts,  or  validated  through  empirical  studies 
(perhaps  using  the  statistical  model  developed  earlier  or  by  the  multiple  comparisons  procedure 
used  in  the  previous  example). 

Let  us  define  the  behavioral  state  space  of  the  regional  population  by 

A( i2)  =  [Hostile,  Neutral,  Friendly] 

and  let  the  transition  probability  matrix  be  given  as  that  in  the  example  above,  or 

0.95  0.04  0.01 
Q(12)=  0.25  0.60  0.15 
0.15  0.25  0.60 
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For  this  example,  suppose  it  was  determined  by  subject  matter  experts  that,  given  the  regional 
populace  is  ‘hostile’  towards  US  Forces,  one  can  expect  15  violent  incidents  per  day  in  this 
region,  on  average.  Further,  given  the  regional  population  holds  a  ‘neutral’  stance  towards  US 
Forces,  one  can  expect  5  violent  incidents  per  day,  on  average.  Similarly,  conditioned  on  the 
“friendly’  behavioral  state,  one  can  expect  1  violent  incident  per  day,  on  average.  Proceeding  in 
this  fashion,  suppose  that  subject  matter  expert  input  yielded  the  following: 

-^(12), 'Hostile'  =[15,  12,  0.05] 


Al 2), ’Neutral'  =  [^’  3,  10] 

^(12), 'Friendly'  =  ^  ’  ^0] 

In  this  example,  the  X ’s  were  simulated  from  Poisson(  A)  distributions.  Figure  10  shows 
the  changes  in  the  environmental  variables  and  the  populace  affinities  towards  US  Forces  over 
time.  Note  that 

1 .  At  time  66,  the  mean  number  of  violent  incidents  and  mean  number  of  civilian  casualties 
increase,  while  the  mean  number  of  enemy  casualties  decreases. 

2.  At  time  126,  the  mean  number  of  violent  incidents  and  mean  number  of  civilian 
casualties  decrease,  while  the  mean  number  of  enemy  casualties  increasers. 

3.  At  time  258,  the  mean  number  of  violent  incidents  and  mean  number  of  civilian 
casualties  decrease,  while  the  mean  number  of  enemy  casualties  increases. 

Notice  in  Figure  10  that  negative  affinities  are  produced  between  times  66  and  125,  due  to  a  high 
level  of  violent  incidents  and  civilian  casualties,  and  a  low  level  of  enemy  casualties.  Further, 
positive  affinities  arte  produced  between  times  258  and  365,  due  to  low  levels  of  violent 
incidents  and  civilian  casualties,  and  a  high  level  of  enemy  casualties. 
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Populace  Affinity 


Figure  10:  Affinity  Profile  of  Regional  Populace  Over  Time  as  Function  of  Number  of 

Three  Count  Variables 


For  this  example,  we  set  i//  =  0.25  in  the  geometric  weighting  scheme.  As  in  the  previous 
example,  the  implication  here  is  that  the  regional  populace  places  about  25%  of  their  weight  on 
current  environmental  conditions,  and  75%  on  recent  or  past  events. 

4.2.3  Example  with  Mixed  Environmental  Variables 

For  this  example,  we  will  assume  the  same  scenario  as  the  example  above  (i.e.,  Regional 
populace  and  US  Forces),  except  that  the  state  of  the  environment  at  time  t  is  measured  by  the 
following: 

1.  Potable  water  supplied  (A),) 

2.  Electric  power  supplied  (X2;) 

3.  Number  of  violent  incidents  (Zt) 

which  is  a  mixture  of  continuous  and  discrete  environmental  state  variables.  Suppose  that  the 
following  was  determined  by  subject  matter  expert  input: 
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Ml  2),' Hostile’  =  IM  ^5]  anC^  Ml  2),' Hostile' 

Al2), 'Neutral'  =  [10,  100]  an(l  M 1 2).' Neutral'  =  ^ 

Ml  2),'  Friendly'  =[15,  105]  and  \\2);^riendly'  =  0.0  1 

and  S(12),Hostile,  =  Z(12)i' Neutral'  =  Mi 2), 'Friendly  =  I2*2  (corresponding  to  the  two  continuous  variables). 

For  example,  the  mean  vector  M12), 'Hostile'  tells  us  that  the  regional  population  is  historically  known 
to  turn  hostile  towards  US  Forces  if  potable  water  level  drops  below  5  units/day,  electric  power 
drops  below  95  units/day,  and  the  number  of  violent  incidents  rises  to  15/day.  Similarly,  with 
respect  to  M12), 'Neutral' >  m  order  to  contain  the  regional  population  in  a  neutral  behavioral  state,  an 
average  of  10  units  of  potable  water,  100  units  of  electric  power,  and  5  violent  incidents  are 
required.  A  similar  interpretation  can  be  made  for  M12) 'Friendly  •  The  specification  of  Z  suggests 
that  equal  importance  is  given  to  both  continuous  variables. 

For  this  example,  the  X/s  were  simulated  from  multivariate  nonnal  distributions,  while 
the  Zt  ’s  were  simulated  from  Poisson  distributions.  Using  the  same  transition  probability  matrix 
and  initial  behavioral  state  probability  distribution  as  in  the  previous  examples,  we  find  affinities 
over  time  as  computed  by  MAMBU  in  Figure  1 1 .  Note  that: 

1 .  At  time  66,  the  mean  number  of  violent  incidents  increases. 

2.  At  time  126,  the  average  units  of  potable  water  supplied  increases,  while  the  mean 
number  of  violent  incidents  decreases. 

3.  At  time  258,  the  average  units  of  electric  power  supplied  decreases. 

Notice  that  the  increase  in  the  mean  number  of  violent  incidents  between  times  66  and  125 
results  in  a  decrease  in  the  affinity  during  this  time.  Also  note  that  the  decrease  in  electric  power 
supplied  at  time  258  also  results  in  a  decrease  in  affinity. 
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Figure  11:  Affinity  Profile  of  Regional  Population  Over  Time  as  Function  of  a  Mixture  of 

Environmental  Variable  Types 

The  examples  discussed  in  this  subsection  show  how  historical  data  and/or  subject  matter 
expert  knowledge  can  be  exploited  to  predict  affinities  between  groups  contained  in  the  OE  over 
time.  It  also  demonstrates  how  MAMBU  can  be  used  as  a  basis  for  a  behavioral  model  within 
the  NOEM  framework. 


37 

Distribution  A:  Approved  for  public  release;  distribution  is  unlimited.  88ABW-201 1-01 10,  13  Jan  201 1 


5.0  CONCLUSIONS 


In  this  effort,  two  novel  approaches  were  developed  for  predicting  group  responses  or 
constructs  (e.g.,  attitudes  and  behavior)  as  a  function  of  changes  in  relevant  factors.  The  first  is 
purely  statistical  in  nature  and  involves  a  hierarchical  linear  modeling  approach.  This  approach 
considers  the  fact  that  individuals  are  nested  within  groups,  which  induces  non-independence 
between  individuals  belonging  to  the  same  group.  Independence  of  observations  is  a  crucial 
assumption  when  using  standard  methods  of  analysis,  such  as  ANOVA  and  multiple  linear 
regression.  As  such,  the  use  of  standard  statistical  methods  in  these  cases  can  severely  mislead 
the  analyst  into  concluding  that  group-level  factors  are  significant  when  in  fact  they  are  not,  and 
individual-level  factors  are  insignificant  when  in  fact  they  have  significant  explanatory  power. 
We  also  proposed  a  novel  data-based  approach  for  detennining  an  appropriate  power 
transfonnation  on  the  response  variable  so  that  the  distribution  of  the  response  variable  better 
agrees  with  the  underlying  model  assumptions.  We  provided  an  example  using  simulated  data 
sets  and  demonstrated  how  the  proposed  modeling  technique  can  be  used  to  develop  meta¬ 
models  for  instantiation  within  the  NOEM.  The  proposed  approach  can  be  used  with  either 
observational  data  to  facilitate  correlation  studies,  or  experimental  data  to  facilitate  causation 
studies. 

The  second  approach  is  probabilistic  in  nature  and  involves  the  assignment  of  a 
probability  distribution  to  the  behavioral  state  space  of  a  group,  and  then  updates  this  distribution 
using  a  Bayesian  approach  as  new  infonnation  becomes  available.  The  method  is  most  useful 
when  knowledge  of  the  conditional  distributions  of  the  environmental  variables  given  the 
behavioral  state  of  the  groups  under  study  is  estimable,  either  via  historical  observations  or 
subject  matter  expert  opinion.  When  historical  observations  are  available  on  each  of  the 
independent  variables  for  each  behavioral  state,  standard  statistical  methods  for  performing 
multiple  comparisons  can  be  used  to  detennine  if  there  are  differences  between  the  means  of  the 
variables  observed  for  a  given  behavioral  state.  This  largely  simplifies  the  analysis  required, 
relative  to  the  hierarchical  linear  modeling  approach  discussed  above. 
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LIST  OF  ACRONYMS 


AFRL 

Air  Force  Research  Laboratory 

ANOVA 

Analysis  of  Variance 

MAMBU 

Markov  Affinity  Model  with  Bayesian  Updates 

NOEM 

National  Operational  Environmental  Model 

OE 

Operating  Environment 

S.E. 

Standard  Error 

TAGS 

Technology  for  Agile  Combat  Support 

US 

United  States 

711  HPW/RHXS 

Sensemaking  &  Organizational  Effectiveness  Branch 

41 

Distribution  A:  Approved  for  public  release;  distribution  is  unlimited.  88ABW-201 1-01 10,  13  Jan  201 1 


